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Background: Zipf's discovery that word frequency distributions obey a power law established 
parallels between biological and physical processes, and language, laying the groundwork for a 
complex systems perspective on human communication. More recent research has also identified 
scaling regularities in the dynamics underlying the successive occurrences of events, suggesting the 
possibility of similar findings for language as well. 

Methodology/Principal Findings: By considering frequent words in USENET discussion 
groups and in disparate databases where the language has different levels of formality, here we show 
that the distributions of distances between successive occurrences of the same word display bursty 
deviations from a Poisson process and are well characterized by a stretched exponential (Weibull) 
scaling. The extent of this deviation depends strongly on semantic type - a measure of the logicality 
of each word - and less strongly on frequency. We develop a generative model of this behavior that 
fully determines the dynamics of word usage. 

Conclusions/Significance: Recurrence patterns of words are well described by a stretched 
exponential distribution of recurrence times, an empirical scaling that cannot be anticipated from 
Zipf's law. Because the use of words provides a uniquely precise and powerful lens on human thought 
and activity, our findings also have implications for other overt manifestations of collective human 
dynamics. 

Cite as: PLoS ONE 4 (11): e7678 (2009), doi:10.1371/journal.pone.0007678 



INTRODUCTION 

Research on the distribution of time intervals be- 
tween successive occurrences of events has revealed cor- 
respondences between natural phenomena on the one 
hand [UH] and social activities on the other hand [3JH1IS]. 
These studies consistently report bursty deviations both 
from random and from regular temporal distributions of 
events 6 . Taken together, they suggest the existence of 
a dynamic counterpart to the universal scaling laws in 
magnitude and frequency distributions |5J [HI EH EQ ■ 
Language, understood as an embodied system of repre- 
sentation and communication |12j . is a particularly in- 
teresting and promising domain for further exploration, 
because it both epitomizes social activity, and provides 
a medium for conceptualizing natural and biological re- 
ality. 

The fields of statistical natural language processing 
and psycholinguistics study language from a dynamical 
point of view. Both treat language processing as en- 
coding and decoding of information. In psycholinguis- 
tics, the local likelihood (or predictability) of words is 
a central focus of current research [13 . Many widely 
used practical applications of statistical natural language 
processing, such as document retrieval based on key- 
words, also exploit dynamic patterns in word statis- 
tics [TUJ [HI IS]- Particularly important for these ap- 
plications, and also noticed in different contexts [TBI HZ1 
HU [El EQl ETJ , is the non-uniform distribution of content 
words through a text, suggesting that connections to the 
previous discoveries about inter-event distributions may 



be revealed through a systematic investigation of the re- 
currence times of different words. 

With the rise of the Internet, large records of spon- 
taneous and collective language are now available for 
scientific inquiry [22j [23j [24], allowing statistical ques- 
tions about language to be investigated with an unprece- 
dented precision. At the same time, large-scale text min- 
ing and document classification is of ever-increasing im- 
portance [25]. The primary datasets used in our study 
are USENET discussion groups available through Google 
( |http://groups.google.com| . These exemplify sponta- 
neous linguistic interactions in large communities over 
a long period of time. We first focus on the N — 2, 128 
words that occurred more than 10,000 times between 
Sept. 1986 and Mar. 2008 in a (2 10 8 -word) discussion 
group, talk. origins. The data were collated chronolog- 
ically, maintaining the thread structure (see Text SI, 
Databases). 

Here, we show that long-time word recurrence pat- 
terns follow a stretched exponential distribution, owing 
to bursts and lulls in word usage. We focus on time 
scales that exceed the scale of syntactic relations, and 
the burstiness of the words is driven by their semantics 
(that is, by what they mean) . The burstiness of physical 
events and socially contextualized choices makes words 
more bursty than an exponential distribution. How- 
ever, we show that words are typically less bursty than 
other human activities |26j due to their logicality or per- 
mutability j27j[28], technical constructs of formal seman- 
tics that index the extent to which the meanings and 
usage of words are stable over changes in the discourse 
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FIG. 1: Recurrence time distributions for the words theory (red) and also (blue) in the USENET group talk. origins, a discussion 
group about evolution and creationism. Both words have a mean recurrence time of (r) w 820. (a) Linear-logarithmic 
representation of fir), showing that the decay is slower than the exponential /3 = 1 prediction Q (black dashed line) and 
follows closely the stretched exponential distribution {2} with f3 = 0.46 {R 2 = 0.9984) for theory and j3 = 0.85 {R 2 = 0.9999) 
for also. For comparison, f3 = 1 yields R 2 = 0.49 for the word theory and i? 2 = 0.9904 for the word also (see Text SI, Fitting 
Procedures). The inset in (a) shows a magnification for short times. A word-dependent peak at r < 50 reflects the domination 
of syntactic effects and local discourse structure at this scale, (b) Cumulative distribution function F(t) in a scale in which 
the stretched exponential |2| appears as a straight line. The panels in the inset show 100 occurrences (top to bottom): of the 
word theory, of the word also, and of a randomly distributed word (J3 = 1). (c) The probability of word usage m(t) for the 
words theory and also. The data are binned logarithmically and the straight lines correspond to Eq. Q. (d) Illustration of 
the generative model for the usage of individual words when f3 = 0.4, where the spikes indicate the times at which the word 
is used. The probability rh(i) of using a word decays as a piece-wise power-law function since its last use, as determined by 
Eq. Q. The Poisson case corresponds to constant m. The panels at the bottom show 100 occurrences of words generated by 
the model for /3 = 0.4 and (3 = 0.8. 



context. Our quantitative analysis of the empirical data 
confirms the inverse relationship between burstiness and 
permutability. The model we develop to explain these ob- 
servations shares the generative spirit of local (n-gram) 
and weakly non-local models of text classification and 
generation [551 EM EI] • However it focuses on long time- 
scales, picking up at temporal scales where studies of 
local predictability and coherence leave off [TS]. We ver- 
ify the generality of our main findings using different 
databases, including books of different genres and a series 
of political debates. 

RESULTS AND DISCUSSION 

We are interested in the temporal distribution of each 
word w. All words are enumerated in order of appear- 



ance, i — 1,2, ...,7V, where i plays the role of the time 
along the text. The recurrence time = — i™ is de- 
fined by the number of words between two successive uses 
(ij and if + i) of word w (plus one). For instance, the first 
appearances of the word the in the abstract above are at 
if e = 22,if e = Al,if e = U,i\ he = 50,..., leading to a 
sequence of recurrence times rf he = 19,r|' le = 3,T^ he = 
6,.... We are interested in the distribution f w (r) of 
t = T™, j = 1,...,N W . The mean recurrence time, 
called by Zipf the wavelength of the word [7], is given 
by (t w ) — N/N w = \jv w [5] (hereafter we drop w from 
our notation). It is mathematically convenient to con- 
sider t to be a continuous time variable (an assump- 
tion that is justified by our interested in r » 1) and 
to use the cumulative probability density function de- 
fined by F(t) = J t °° /(f)df, which satisfies F(0) = 1 
and f °° F{r)dr = J °° rf(r)dr = (r) = l/v. 
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The first point of interest is how the distribution /(r) 
[or F(t)] deviates from the exponential distribution 

/ P (r) = M e-^, F p (t) = e-»\ (1) 

where (t) = X/v leads to /i = z/. The exponential dis- 
tribution is predicted by a simple bag-of-words model in 
which the probability /i of using the word is time inde- 
pendent and equals v (a Poisson process with rate /i = 
v) as observed if the words in the text 

are randomly permuted. Deviations are caused by the 
way that people choose their words in context. Numer- 
ous studies, as reviewed in Ref. [32], already demonstrate 
that the language users dynamically modify their use of 
nouns and noun phrases as a function of the linguistic 
and external context. We analyze such modifications for 
all types of words. 

Figure [T] shows the empirical results obtained for the 
example words theory and also in the talk. origins group of 
the USENET database. Both words have (r) « 820 but 
are linguistically quite different: while theory is a com- 
mon noun, also is an adverb that functions semantically 
as an operator. The deviation from the Poisson predic- 
tion (|T|) is apparent in Fig. [TJa-c): /(r) is larger than 
the exponential distribution for distances r both much 
shorter and much longer than (r), while it is smaller 
for r « (t). Both words exhibit a most probable re- 
currence time r < 20 and a monotonically decaying dis- 
tribution /(t) for larger times [Fig. [lja)]. Comparing 
the insets in Fig. [TJb) , one sees that the occurrences of 
theory are clustered close to each other in a phenomenon 
known as burstiness [5J Q31 [T3J [HE HI]. Due to bursti- 
ness, the frequency of the word theory estimated from a 
small sample would differ a great deal as a function of 
exactly where the sample was drawn. Similar but lesser 
deviations are observed for the word also. 

Central to our discussion, Fig.[T]shows that the distri- 
butions of both words can be well described by the single 
free parameter (3 of the stretched exponential distribution 

f p (r) = apT^e-^ , F p (r) = e~ aT " , (2) 

where a — ap — (y r(^il)]' 5 is obtained by impos- 
ing (r) = 1/v, r is the Gamma function, and < f3 < 1. 
Distribution also known as Weibull distribution, and 
similar stretched exponential distributions describe a va- 
riety of phenomena [HJ [23J [33J EH 111] , including the re- 
currence time between extreme events in time series with 
long-term correlations [21 [36]. The stretched exponen- 
tial ^ is more skewed than the simple exponential distri- 
bution (|T|) , which corresponds to the limiting case (3 = 1 , 
but less skewed than a power law, which is approached 
for 0. 

A crucial test for the claim that an empirical distri- 
bution F(t) follows a stretched exponential Fp is to 
represent — log(F(T)) as a function of r in a double log- 
arithmic plot The straight line behavior for almost 



three decades shown in Fig. [TJb) , which is illustrative 
of the words in our datasets, provides strong evidence 
for the stretched exponential scaling (spam-related devi- 
ations for long t are discussed in Text SI, Databases). 
This is a clear advance over the closest precedents to 
our results: (i) In Ref. [8] Zipf proposed a power-law 
decay, which would appear as an horizontal line in 
Fig. [TJd. (ii) Refs. [T4"l [15] compare two non-stationary 
Poisson processes for predicting the counts of words in 
documents (see Text SI, Counting Distribution); (iii) 
Ref. [T!5] proposes a non-homogeneous Poisson process 
for recurrence times, using a mixture of two exponentials 
with a total of four free parameters; (iv) Ref. [37] uses 
the Zipf-Alekseev distribution /(r) ~ r a4bl(T) , which 
we found to underestimate the decay rate for large r and 
to leave larger residuals than our fittings (see Text SI, 
Zipf-Alekseev Distribution). The stretched exponential 
distribution was found to describe the time between 
usages of words in Blogs and RSS feeds in Ref. [24] , 
However, time was measured as actual time and the 
same distribution was found for different types of words, 
suggesting that their observations are driven by the 
bursty update of webpages, a related but different effect. 
More strongly related to our study is Ref. [5]'s analysis 
of email activity, in which a non-homogeneous Poisson 
process captures the way one email can trigger the next. 

Generative Model. Motivated by the successful de- 
scription of the stretched exponential distribution |2]) , we 
search for a generative stochastic process that can model 
word usage. We consider the inverse frequency (r) as 
given and focus on describing how the words are dis- 
tributed throughout the text. We assume that our text 
(abstractly regarded as arbitrarily long) is generated by a 
well-defined stationary stochastic process with finite (r) 
for the words of interest. We further assume that the 
probability m(t) of using the word w depends only on 
the distance t since the last occurrence of the word. The 
latter means that we arc modeling the word usage as a 
renewal process |34[ 136] . The distribution of recurrence 
times is then given by the (joint) probability of having 
the word at distance r and not having this word for t < t: 

T-l 

/(r) = m(r) J|(l - m(i)) « m(r) e - K m «<« 
■i=i 

The cumulative distribution function is written as 

F{t) = e -Jo T ™(*) d *. (3) 

The time dependent probability m(t), also known as haz- 
ard function, can be obtained empirically as m(t) — 
f(t)/F(t) (see Text SI, Hazard Function). Equation ^ 
reduces to the exponential distribution (JT|) for a time in- 
dependent probability m(t) = fi — 1/(t). The stretched 
exponential distribution (pi is obtained from pi by as- 
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Fraction of all 2,128 words with inverse frequency < <x) 
0.03 0.1 0.25 0.5 0.95 




FIG. 2: Dependence of /3 on semantic Class and frequency for the 2, 128 most frequent words of the USENET group talk. origins. 
Different classes of words (see Table W) are marked in different colors, (a) Fitting of /3 exemplified for four words with 
R 2 ~ Rmedian = 0.993 (bottom to top): God, Class 1, = 0.39, (r) = 586; fundamentalists, Class 2, /3 = 0.45, (r) = 15,825; 
listen, Class 3, /3 = 0.56, (r) = 21, 971; seemed, Class 4, /3 = 0.67, (r) = 19, 564. (b) Histogram of the fitted /3, providing evidence 
that the Class is determinant to the value of p. (c) Quality of fit quantified in terms of the coefficient of determination R 2 
between the fitted stretched exponential and the empirical F(t) (see Text SI, Quality of Fit). The box-plots are centered at the 
median and indicate the 1, 2, 6, 7 octiles. For comparison, an exponential fit with two free parameters yields Radian ~ 0.907 (see 
Text SI, Deviation from the Exponential Distribution), (d) Relative dependence of on Class and (r) = 1/v (inverse frequency), 
indicating: running median on words ordered according to (r) (center black line) and 1-st and 7-th octiles (boundaries of the 
gray region); and running medians on words by Class (colored lines, Class 1-4, from bottom to top) with illustrative words for 
each Class. At each (r) , large variability in f3 and a systematic ordering by Class is observed, (e) Box-plots of the variation of 
P for words in a given Class. The box-plots in the background are obtained using frequency to divide all words in four groups 
with the same number of words of the semantic Classes (first box-plot has words with lowest frequency and last box-plot has 
words with highest frequency). The classification based on Classes leads to a narrower distribution of /3's inside Class and to 
a better discrimination between Classes. 



serting that [54"l|5r51l55] 

m(t) = aPr^-V for < P < 1. (4) 

This assertion means that in our model, the probability 
of using a word decays as a power law since the last use 
of that word. This is further justified by the power-law 
behavior of m(t) determined directly from the empiri- 
cal data, as shown in Fig. [TJc) and Text SI, Fig. 9, and 
is in agreement with results from mathematical psychol- 
ogy j39j HO] and information retrieval (40] . The Weibull 
renewal process we propose can be analyzed formally as 
a particular instance of a doubly stochastic Poisson pro- 
cess |4"T] . 



Our model is illustrated in Fig. |TJd) and can be in- 
terpreted as a bag-of-words with memory that accounts 
for the burstiness of word usage. This model docs 
not reproduce the positive correlations between tj and 
Tj+P El El I2S]i which are usually small (less than 20% 
for p = 1) but decay slowly with p (see Text SI, Correla- 
tion in {tj} ). These correlations quantify the extent to 
which the renewal model is a good approximation of the 
actual generative process, and show that the burstiness 
of words exists not only as a departure of /(t) from the 
exponential distribution, but also as a clustering of small 
(large) r jB] (see Text SI, Independence of {tj}). The 
advantage of the renewal description is that the model 
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(i) can be substantiated to a vast literature describing 
power-law decay of memory in agreement with Eq. Q, 
see Refs. [39J HQ] and references therein, and (ii) fully 
determines the dynamics (allowing, e.g., the precise 
derivation of counting distributions |38j , which are used 
in applications to document classification [TH [T5] and 
information retrieval [10]). 

Word Dependence. We have seen in Fig. [I] that the 
word-dependent deviation from the exponential distri- 
bution is encapsulated in the parameter (3: the smaller 
the j3 for any given word, the larger the deviation (see 
Text SI, Deviation from the Exponential Distribution). 
Next we investigate the dominant effects that determine 
the value of the parameter (3 of a word. Previous re- 
search has observed that frequent function words (such 
as conjunctions and determiners) usually are closer to 
the random (Poisson) prediction while less frequent con- 
tent words (particularly names and common nouns) are 
more bursty. These observations were quantified using: 
(i) an entropic analysis of texts [T5J; (ii) the variance of 
the sequence of recurrence times |17] ; (iii) the recurrence 
time distribution [THl S2 ; and (iv) the related distribu- 
tion of the number of occurrences of words per docu- 
ment [21 [15] . Because we have a large database and do 
not bin the datastream into documents, we are able to 
go beyond these insightful works and systematically ex- 
amine frequency and linguistic status as factors in word 
burstiness. 

Our large database allows a detailed analysis of words 
that, despite being in the same frequency range, have 
very different statistical behavior. For instance, in the 
range 2, 000 < (r) < 3, 000, words with high (w 0.80) 
include once, certainly, instead, yet, give, try, makes, and 
seem; the few words with (3 J$ 0.40 include design, selec- 
tion, intelligent, and Wilkins. Corroborating Ref. [14] . 
it is evident that words with low j3 better characterize 
the discourse topic. However, these examples also show 
that the distinction between function words and content 
words cannot be explanatory. For instance, many con- 
tent words, such as the adverbs and verbs of mental rep- 
resentation in the list just above, have (3 values as high 
as many function words. Here we obtain a deeper level 
of explanation by drawing on tools from formal seman- 
tics, specifically on type theory [571 SSI SI] > an d on dy- 
namic theories of semantics [151 SB], which model how 
words and sentences update the discourse context over 
time. We use semantics rather than syntax because syn- 
tax governs how words are combined into sentences, and 
we are interested in much longer time scales over which 
syntactic relations are not defined. Type theory estab- 
lishes a scale from simple entities (e.g., proper nouns) to 
high type words (e.g., words that cannot be described 
using first-order logic, including intensional expressions 
and operators). Simplifying the technical literature in 
the interests of good sample sizes and coding reliability, 



we define a ladder of four semantic classes, as listed in 
Table [H 



Class 


Name 


Examples of words 


1 


Entities 


Africa, Bible, Darwin 


2 


Predicates and Relations 


blue, die, in, religion 


3 


Modifiers and Operators 


believe, everyone, forty 


4 


Higher Level Operators 


hence, let, supposedly, the 



TABLE I: Examples of the classification of words by seman- 
tic types. The primitive types are entities e, exemplified by 
proper nouns such as Darwin (Class 1), and truth values, t 
(which are the values of sentences). Predicates or relations, 
such as the simple verb die, and the adjective/noun blue, 
take entities as arguments and map them to sentences (e.g., 
Darwin dies, Tahoe is blue). They are classified as < e,t > 
(Class 2). The notation < x, y > denotes a mapping from 
an element x in the domain to the image y [431 144] . The 
semantic types of higher Classes are established by assessing 
what mappings they perform when they are instantiated. For 
example, everyone is of type << e,t >,t > (Class 3), be- 
cause it is a mapping from sets of properties of entities to 
truth values [33]; the verb believe shares this classification as 
a verb involving mental representation. The adverb suppos- 
edly is a higher order operator (Class 4), because it modifies 
other modifiers. Following Ref. [44] (contra Ref. [43]) words 
are coded by the lowest type in which they commonly occur 
(see Text SI, Coding of Semantic Types). 

In Fig. [2] we report our systematical analysis of the 
recurrence time distribution of all 2, 128 words that ap- 
peared more than ten thousand times in our database (for 
word-specific results see Table SI). We find a wide range 
of values for the burstiness parameter (3 [0.2 < (3 < 0.9, 
Fig. |2ja,b)] and the stretched exponential distribution 
describes well most of the words [R^ nedian = 0.993, 
Fig.[2jc)] . The Class-specific results displayed in Fig.|2ja- 
c) show that words of all classes are accurately described 
by the same statistical model over a wide range of scales, 
a strong indication of a universal process governing word 
usage at these scales. Figure |2^b) also reveals a system- 
atic dependence of [3 on the semantic Classes: burstiness 
increases (/? decreases) with decreasing semantic Class. 
This relation implies that words functioning unambigu- 
ously as Class 3 verbs should be less bursty than words of 
the same frequency functioning unambiguously as com- 
mon nouns (Class 2). This prediction is confirmed by 
a paired comparison in our database: such verbs have a 
higher (3 in 103 out of 116 pairs of verbs and frequency- 
matched nouns (sign test, P < 8 10~ 19 ). The relation 
applies even to morphologically related forms of the same 
word stem (see Text SI, Lemmatization): for 37 out of 
the 47 pairs of Class 3 adjectives and Class 4 adverbs 
in the database that are derived with -ly, such as per- 
fect, perfectly, the adverbial form has a higher [3 than 
the adjective form (sign test, P < 5 10~ 5 ). Figure |2jd) 
shows the dependence of [3 on inverse frequency (r) . This 
figure may be compared to the TF-IDF (term frequency- 
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FIG. 3: Stretched exponential recurrence time distributions observed in different databases. The databases consist of the 
documentary novel Os Sertoes by Euclides da Cunha (S), in Portuguese (N w 1.5 10 5 ); the USENET group comp.os.linux.misc 
(U) between Aug. 1993 and Mar. 2008 (N w 6 10 7 ); the three Obama-McCain debates of the 2008 United States presidential 
election (D) arranged in chronological order (N w 5 10 4 ); an English edition of the novel War and Peace by Leon Tolstoy (W) 
(TV ~ 6 10 5 ); and the first English edition of Isaac Newton's Principia (P) (N « 2 10 5 ). All words appearing more than 100 
times were considered in S (117 words), D (78 words), P (268 words), and W (633 words), whereas in U all 733 words appearing 
more than 10,000 times were used (see Text SI, Databases), (a) Recurrence time distributions for the words quase in S (/3 = 
0.88, (r) = 1, 204, R 2 = 0.996), simple in U (f3 = 0.71, (r) = 3, 397, R 2 = 0.996), would in D (f3 = 0.61, (r) = 359.5, R 2 = 0.995), 
voices in W 03 = 0.58, (r) = 3,946,i? 2 = 0.994), and diameter in P (/3 = 0.40, (r) = 1, 129, R 2 = 0.975). (b) Histograms of the 
fitted P for all datasets. Due to sample size limits, the analysis into semantic Classes is not feasible for the smaller datasets. 
(c) Box-plots of the coefficient of determination R 2 of the corresponding stretched exponential fit. 



inverse document frequency) method used for keyword 
identification |14j . but it is computed from a single doc- 
ument (see also Refs. [TH1 HZl HE]). Figure |2jd) reveals 
that j3 is correlated with (r) and that the Class ordering 



observed in Fig. 
analysis in Fig. u. 



2[b) is valid at all (r)s. The detailed 
e) demonstrates that semantic Class is 
more important than frequency as a predictor of bursti- 
ness (Class accounts for 0.32 and log-frequency for 0.26 
of the variance of (3, by the test proposed in Ref. [H]). 

We are now in a position to discuss why burstiness 
depends on semantic Class. A straw man theory would 
seek to derive the burstiness of referring expressions 
directly from the burstiness of their referents. The 
limitations of such a theory are obvious: Oxygen is a 
very bursty word in our database (/3 w 0.25) though 
oxygen is ubiquitous. A more careful observer would 
connect the burstiness of words to the human decisions 
to perform activities related to the words. For instance, 
the recurrence time between sending emails is known 
to approximately follow a power law [3l |5]. However, 
in our database the word email is significantly closer 
to the exponential (/3 ~ 0.5) than a power law would 
predict {(3 — > 0). Indeed, a defining characteristic of 
human language is the ability to refer to entities and 
events that are not present in the immediate reality |48j . 
These nontrivial connections between language and the 
world are investigated in semantics. An insight on the 
problem of word usage can be obtained from Ref. |27j . 
which establishes that the meaning and applicability 



of words with great logicality remains invariant under 
permutations of alternatives for the entities and relations 
specified in the constructions in which they appear. 
Here we consider permutability to be proportional to the 
semantic Classes of Table [TJ As a long discourse unfolds 
exploring different constructions, we expect words with 
higher permutability (higher semantic Class) to be more 
homogeneously distributed throughout the discourse 
and therefore have higher (3 (be less bursty). Critical to 
this explanation is the fact that human language manip- 
ulates representations of abstract operators and mental 
states [49 . However, the overt statistics of recurrence 
times do not need to be learned word by word. It seems 
more likely that they are an epiphenomenal result of 
the differential contextualization of word meanings. The 
fact that the behavior of almost all words deviate from 
a Poisson process to at least some extent, indicates that 
the permutability and usage of almost all words are 
contextually restricted to some degree, whether by their 
intrinsic meaning or by their social connotations. 

Different Databases. In Fig. [3] we verify our main re- 
sults using databases of different sizes and characterized 
by different levels of formality. We analyzed a second 
example of a USENET group (U), a series of political 
debates (D), two novels (S,W), and a technical book (P) 
(for word-specific results see Table SI). The stretched 
exponential provides a close fit for frequent words in 
these datasets [Fig. |3ja,c)], and a wide and smoothly 
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varying range of /3s is observed in each case [Fig. [3j 
The technical book exhibits lower (3 values, which can 
be attributed to the predominance of specific scientific 
terms. These datasets include examples of texts differing 
by almost four orders of magnitudes in size, generated 
by a single author (books), a few authors (debates) 
or a large number of authors (USENET), in writing 
and speech (e.g., books vs. debates), and in different 
languages (e.g., novels), indicating that the stretched 
exponential scaling is robust with regard to sample size, 
number of authors, language mode, and language. 

Conclusions. The quest for statistical laws in language 
has been driven both by applications in text mining and 
document retrieval, and by the desire for foundational 
understanding of humans as agents and participants in 
the world. Taking texts as examples of extended dis- 
course, we combined these research agendas by show- 
ing that word meanings are directly related to their re- 
currence distributions via the permutability of concepts 
across discourse contexts. Our model for generating long- 
term recurrence patterns of words, a bag-of- words model 
with memory, is stationary and uniformly applicable to 
words of all parts of speech and semantic types. A word's 
position along the range in the memory parameter in the 
model, fj, effectively captures its position in between a 
power-law and an exponential distribution, thus captur- 
ing its degree of contextual anchoring. Our results agree 
with Ref. [49 in emphasizing both the specific ability 
to learn abstract operators and the broader conceptual- 
intentional system as components in the human capabil- 
ity for language and in its use in the flow of discourse. 

Analogies between communicative dynamics and 
social dynamics more generally are suggested by the 
recent documentation of heavy-tailed distributions in 
many other human driven activities [5] They 
indicate that tracing linguistic activities in the ever 
larger digital databases of human communications can 
be a most promising tool for tracing human and social 
dynamics 22J. The stretched exponential form for 
recurrence distributions that derives from our model and 
the empirical finding it embodies are thus expected to 
also find applicability in other areas of human endeavor. 
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